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calculating the distance between the original 
waveform and each of the synthetic waveforms based on the 
corresponding - alignments . 

3. The program storage device of claim 2, 
wherein the instructions for performing the comparing step 
further include instructions for: 

retrieving feature vectors corresponding to the 
original waveform; and 

generating feature vectors for each synthetic 
waveform such that the feature vectors for the synthetic 
waveforms are similar in structure to the feature vectors of 
the original waveform; 

wherein the alignment is performed by 
time-aligning the feature vectors of the original waveform 
and the feature vectors of each synthetic waveform with the 
corresponding one of the N text sequences. 

4. The program storage device of claim 2, 
wherein the alignment is performed using Viterbi alignment 
process. 

5. The program storage device of claim 2, 
wherein the alignment is performed on a phoneme level. 
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6. The program storage device of claim 2, 
wherein the instructions for calculating the distance 
include instructions for performing the steps of: 

calculating an individual distance between each 
aligned frame of the original waveform and each of the N 
synthetic waveforms; and 

summing the individual distances of the aligned 
frames of the original waveform and each synthetic waveform. 

7. The program storage device of claim 1, 
wherein the instructions for performing the comparing step 
include instructions for performing the steps of: 

(a) setting a parameter N=l; 

(b) -retrieving the Nth synthetic waveform and the 
corresponding Nth text sequence; 

(c) time-aligning frames of the original waveform 
and frames of the Nth synthetic waveform to corresponding 
text of the Nth text sequence; 

(d) computing an individual distance between each 
corresponding aligned frame of the original and Nth 
synthetic waveform; 

(e) summing the individual distances to compute 
the distance between the original waveform and the Nth 
synthetic waveform; 
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(f) determining if 'the computed distance is lesjs. 
than a current best distance value;^ 

(g) • setting the current best distance value equal 
to the computed distance and saving the Nth text sequence 
for consideration as the final output, if the computed 
distance is determined to be less than the current best 
distance threshold; 

(h) incrementing the parameter N by one; and 

(i) repeating steps (b) through (h) until each of 
the N text sequences have been considered. 

8. The program storage device of claim 7, 
wherein the instructions for performing the step of 
determining the' individual distance (step d) include 
instructions for: 

computing a. mean feature vector of all feature 
vectors comprising each aligned frame for both the original 
and Nth synthetic waveform, wherein the individual distance 
for each aligned frame is calculated by determining a 
distance between each mean of the corresponding aligned 
frames. -~ 

9. A method for rescoring N-best hypotheses of a 
decoded original waveform output from an automatic speech 
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recognition system, the N-best hypotheses . comprising N texi. 
sequences, the method comprising the steps of: 

generating a synthetic waveform for each of the N 
text sequences; 

comparing each synthetic waveform with the 
original waveform to determine the synthetic waveform that 
is closest to the original waveform; and 

selecting for output the text sequence 
corresponding to the synthetic waveform determined to be 
closest to the original waveform. 

10. The method of claim 9, wherein the comparing 
step includes the steps of: 

aligning frames of the original waveform and 
frames of each synthetic waveform to a corresponding one of 
the N text sequences; and 

calculating the distance between the original 
waveform and each of the synthetic waveforms based on the 
corresponding-alignments . 

11. The method of claim 10, wherein the compari-rrg 
step further includes the steps of: 

retrieving feature vectors corresponding to the 
original waveform; and 
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generating feature vectors for each synthetic 
waveform such that the feature vectors for the synthetic 
waveforms are -similar in structure to the feature vectors of 
the original waveform; 

wherein the alignment is performed by 
time-aligning the feature vectors of the original waveform 
and the feature vectors of each synthetic waveform with the 
corresponding one of the N text sequences. 

12. The method of claim 10, wherein the step of 
calculating the distance includes the steps of: 

calculating an individual distance between each 
aligned frame of the original waveform and each of the N 
synthetic waveforms; and 

summing the individual distances of the aligned 
frames of the original waveform and each synthetic waveform. 

13. The method of claim 9, wherein the comparing 
step includes .-the steps of: 

(a) setting a parameter N=l; 

(b) retrieving the Nth synthetic waveform and t-fre 
corresponding Nth text sequence; 
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(c) time-aligning frames of the original waveform 
and" frames of the Nth synthetic waveform to corresponding 
text of the Nth text sequence; 

(d) computing an individual distance between each 
corresponding aligned frame of the original and Nth 
synthetic waveform; 

(e) summing the individual distances to compute 
the distance between the original waveform and the Nth 
synthetic waveform; 

(f) determining if the computed distance is less 
than a current best distance value; 

(g) setting the current best distance value equal 
to the computed distance and saving the Nth text sequence 
for consideration as the final output, if the computed 
distance is determined to be less than the current best 
distance threshold; 

(h) incrementing the parameter N by one; and 

(i) repeating steps (b) through (h) until each of 
the N text sequences have been considered. 

14. The method of claim 13, wherein the step of- 
determining the individual distance (step d) includes the 
steps of: 
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computing a mean feature vector of all feature 
vectors comprising each aligned frame for both the original 
and Nth synthetic waveform, wherein the individual distance 
for each aligned frame is calculated by determining a 
distance between each means of the corresponding aligned 
frames . 

15. An automatic speech recognition system, 
comprising: 

a decoder for decoding an original waveform of 
acoustic utterances to produce N text sequences, the N text 
sequences representing N-best hypotheses of the decoded 
original waveform; 

a waveform generator for generating a synthetic 
waveform for each of the N text sequences; and 

a comparator for comparing each synthetic waveform 
with the original waveform to rescore the N-best hypotheses . 

16. - The system of claim 15, further comprising a 
feature analysis processor adapted to generate a set of 
feature vectors for the original waveform and generate a s'el: 
of feature vectors for each of the N synthetic waveforms 
using a similar feature analysis process. 
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17. The system of claim 15, further comprising a. 
processor adapted to process one of the original waveform, 
the synthetic - waveforms, and both, to compensate for 
speaker-dependent variations . 

18. The system of claim 15, wherein the 
comparator comprises: 

means for determining the synthetic waveform that 
is closest in distance to the original waveform; and 

means for outputting the N text sequence 
corresponding to the synthetic waveform that is determined 
to be closest to the original waveform. 

19. "The system of claim 18, wherein the means for 
determining the closest synthetic waveform utilizes one of a 
distance score, a language model score, an acoustic model 
score, and a combination thereof, for determining the 
closest distance. 

20. The system of claim 18, wherein the means for 
determining the closest synthetic waveform comprises: -~ 

means for aligning frames of the original waveform 
and frames of each synthetic waveform to a corresponding one 
of the N text sequences; and 
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means for calculating the distance between the 
original waveform and each of the synthetic waveforms based 
on the corresponding alignments. 

21. The system of claim 20, wherein the frames 
are aligned on a phoneme level. 

22. The system of claim 20, wherein the means for 
calculating the distance comprises: 

means for calculating an individual distance 
between each aligned frame of the original waveform and each 
of the N synthetic waveforms; and 

means for summing the individual distances of- the 
aligned frames "of the original waveform and each synthetic 
waveform to compute the distance between the original 
waveform and each synthetic waveforms. 



YO999-046 PJO (8728-252) 



-28- 



